We can load the core SNP alignment using the ape package.
I have also included a csv file with the Global Pneumococcal Sequencing Cluster (GPSC) classifications to help with visualisation. This can be loaded using the following command.
We can also load a phylogeny built using Gubbins.
Lets also load some metadata on which isolates are resistant to Penicillin
and finally the dates at which the genomes were sampled
Use the RCandyVis function to visualise the recombination tracts inferred using Gubbins. You will need the recombination predictions ./data/core_snps.recombination_predictions.gff and the annotated reference genome ./data/Streptococcus_pneumoniae_ATCC_700669.gff.
We will now investigate transmission between the samples.
Starting with the most simple approach, first calculate the SNP distance between genomes using ape’s read.dna and dist.dna functions.
Plot the distribution of pairwise distances using the hist function
A common SNP distance threshold for identifying potential transmission links is 12 in Streptococcus pneumoniae. Cluster the isolates if they are within 12 SNPs using single linkage clustering. This can be achieved using the hclust and cutree functions.
Plot the results against the phylogeny using ggtree and facet_plot
So far we have completely ignored the date samples were taken. We can improve this using the TransCluster function. Here we will use the original R package but if you are clustering very large alignments fasttranscluster can be a good alternative.
You will need to convert the dates to numeric years prior to running TransCluster.
Run TransCluster. We assume a generation time of 2 months (6 per/year) and a substitution rate of 5.3 snps/year. You will need the createModel, setParams and setTransThresholds functions to set up the model.
Generate the clusters using the setCutoffs and makeTransClusters functions.
Plot the results against the phylogeny using ggtree and facet_plot.
So far we have only considered pairwise relationships. We can often do better than this by building a phylogeny. To account for the time saples were taken we need to create a dated tree.
Here we focus on a smaller cluster where we are more likely to get an accurate dating of the tree.
Use the roottotip function to test for temporal signal
Now run Nactdater using the bactdate function and plot the result.
So far we have ignored the fact that Gubbins has filtered out parts of the alignment to deal with recombination. Use the loadGubbins function to re-run the analysis taking recombination into account. Use the drop.tip.useRec function to filter the tree to the same set of samples. You will need to use the ./data/core_snps prefix to do this.
Now we have a dated tree we can infer a transmission network using TransPhylo. As with TransCluster we will assume a mean generation time of 2 months (2/12 years) and a standard deviation of 1.
First convert the phylo object into the correct format using the ptreeFromPhylo function and generate a plot. We will assume that the last sample was taken in 2010.
We can now infer the transmission network using the inferTTree function. It is often better to use the slower starting function (optiStart=1) and a good starting point for the number of mcmcInterations is 10,000. Let’s assume the date sampling finished was 2010.5.
##
|
| | 0%
|
| | 1%
|
|= | 1%
|
|= | 2%
|
|== | 2%
|
|== | 3%
|
|== | 4%
|
|=== | 4%
|
|=== | 5%
|
|==== | 5%
|
|==== | 6%
|
|===== | 6%
|
|===== | 7%
|
|===== | 8%
|
|====== | 8%
|
|====== | 9%
|
|======= | 9%
|
|======= | 10%
|
|======= | 11%
|
|======== | 11%
|
|======== | 12%
|
|========= | 12%
|
|========= | 13%
|
|========= | 14%
|
|========== | 14%
|
|========== | 15%
|
|=========== | 15%
|
|=========== | 16%
|
|============ | 16%
|
|============ | 17%
|
|============ | 18%
|
|============= | 18%
|
|============= | 19%
|
|============== | 19%
|
|============== | 20%
|
|============== | 21%
|
|=============== | 21%
|
|=============== | 22%
|
|================ | 22%
|
|================ | 23%
|
|================ | 24%
|
|================= | 24%
|
|================= | 25%
|
|================== | 25%
|
|================== | 26%
|
|=================== | 26%
|
|=================== | 27%
|
|=================== | 28%
|
|==================== | 28%
|
|==================== | 29%
|
|===================== | 29%
|
|===================== | 30%
|
|===================== | 31%
|
|====================== | 31%
|
|====================== | 32%
|
|======================= | 32%
|
|======================= | 33%
|
|======================= | 34%
|
|======================== | 34%
|
|======================== | 35%
|
|========================= | 35%
|
|========================= | 36%
|
|========================== | 36%
|
|========================== | 37%
|
|========================== | 38%
|
|=========================== | 38%
|
|=========================== | 39%
|
|============================ | 39%
|
|============================ | 40%
|
|============================ | 41%
|
|============================= | 41%
|
|============================= | 42%
|
|============================== | 42%
|
|============================== | 43%
|
|============================== | 44%
|
|=============================== | 44%
|
|=============================== | 45%
|
|================================ | 45%
|
|================================ | 46%
|
|================================= | 46%
|
|================================= | 47%
|
|================================= | 48%
|
|================================== | 48%
|
|================================== | 49%
|
|=================================== | 49%
|
|=================================== | 50%
|
|=================================== | 51%
|
|==================================== | 51%
|
|==================================== | 52%
|
|===================================== | 52%
|
|===================================== | 53%
|
|===================================== | 54%
|
|====================================== | 54%
|
|====================================== | 55%
|
|======================================= | 55%
|
|======================================= | 56%
|
|======================================== | 56%
|
|======================================== | 57%
|
|======================================== | 58%
|
|========================================= | 58%
|
|========================================= | 59%
|
|========================================== | 59%
|
|========================================== | 60%
|
|========================================== | 61%
|
|=========================================== | 61%
|
|=========================================== | 62%
|
|============================================ | 62%
|
|============================================ | 63%
|
|============================================ | 64%
|
|============================================= | 64%
|
|============================================= | 65%
|
|============================================== | 65%
|
|============================================== | 66%
|
|=============================================== | 66%
|
|=============================================== | 67%
|
|=============================================== | 68%
|
|================================================ | 68%
|
|================================================ | 69%
|
|================================================= | 69%
|
|================================================= | 70%
|
|================================================= | 71%
|
|================================================== | 71%
|
|================================================== | 72%
|
|=================================================== | 72%
|
|=================================================== | 73%
|
|=================================================== | 74%
|
|==================================================== | 74%
|
|==================================================== | 75%
|
|===================================================== | 75%
|
|===================================================== | 76%
|
|====================================================== | 76%
|
|====================================================== | 77%
|
|====================================================== | 78%
|
|======================================================= | 78%
|
|======================================================= | 79%
|
|======================================================== | 79%
|
|======================================================== | 80%
|
|======================================================== | 81%
|
|========================================================= | 81%
|
|========================================================= | 82%
|
|========================================================== | 82%
|
|========================================================== | 83%
|
|========================================================== | 84%
|
|=========================================================== | 84%
|
|=========================================================== | 85%
|
|============================================================ | 85%
|
|============================================================ | 86%
|
|============================================================= | 86%
|
|============================================================= | 87%
|
|============================================================= | 88%
|
|============================================================== | 88%
|
|============================================================== | 89%
|
|=============================================================== | 89%
|
|=============================================================== | 90%
|
|=============================================================== | 91%
|
|================================================================ | 91%
|
|================================================================ | 92%
|
|================================================================= | 92%
|
|================================================================= | 93%
|
|================================================================= | 94%
|
|================================================================== | 94%
|
|================================================================== | 95%
|
|=================================================================== | 95%
|
|=================================================================== | 96%
|
|==================================================================== | 96%
|
|==================================================================== | 97%
|
|==================================================================== | 98%
|
|===================================================================== | 98%
|
|===================================================================== | 99%
|
|======================================================================| 99%
|
|======================================================================| 100%
Investigate whether the algorithm has converged by plotting the resulting object. We are looking to see that the MCMC chains have converged to consistent values.
Now plot the resulting transmission networks using the medTTree and extractTTree functions.
Finally, generate a pairwise transmission matrix from the inferred network using the computeMatTDist and levelplot functions.